Background

Me, my wife, her family, my family, and our friends are crazy about board games. The Idea of gathering together as a group and stepping into another world for a moment in time thrills us. After we married, we started to put together a collection of board games. As the year progressed, my wife and I were trying to figure out what board games we would want to add to our collection and that is when we decided to put my skills to the test. My wife put together a small dataset of boardgames and their attributes and I took that data and modeled graphs to show the best choices of board games for us. After our success, I decided to go bigger and found a stronger dataset full of thousands of boardgames.

As I researched board games I began to see that the games I really enjoyed were published by the same publishers. I came up with a question to guide my research, “Which publisher offers the best variety of high rated board games?”

Analysis

Reading in the data

I joined two datasets together. The main dataset I use came from mrpantherson on Kaggle and the other came from Gabrielle Baldassarre on Kaggle.

library(RSQLite)
library(tidyverse)
library(lubridate)
library(plotly)
library(pander)

fivethousand <- read_csv("C:/Users/james/Documents/git/M335_FA18_Sant_Keat/data/bgg_db_2018_01.csv")
filename <- "C:/Users/james/Documents/git/M335_FA18_Sant_Keat/data/database.sqlite"
sqlite.driver <- dbDriver("SQLite")
db <- dbConnect(sqlite.driver,
                dbname = filename)

#dbListTables(db)
boardgames <- dbReadTable(db,"BoardGames")

Wrangling the data

The structure for the data was not normalized and I found a few columns in which multiple values were placed in the same cell. For example, all the publishers for a game were found in one cell for each game. This would be a problem if I wanted to individually show data on each publisher. Using regular expression, I separated the values and placed each publisher in their own cell.

publisher <- boardgames %>% 
  select(details.name, attributes.boardgamepublisher) %>% 
  rename(names = details.name, publisher = attributes.boardgamepublisher)
  
# join sql publisher dataset with dataset of 5000 rows
games <- fivethousand %>% 
  left_join(publisher, by="names") %>% 
  filter(year > 0) %>% 
  mutate(year = as.integer(year))

# Found pattern and seperated by common character ","
games_p <- games %>%
  {mutate(.,publisher = str_replace_all(publisher,
                                        ",(?=[^\\s])","/"))} 

games_publisher <- games_p %>% 
  separate(publisher, into = paste0(rep("p",67), 1:67), sep = "/") %>% 
  gather(key = "p_id", value = "publisher", 21:87) %>% 
  filter(publisher != 0, !is.na(publisher)) %>% 
  select(-category, -designer, -mechanic, -bgg_url, -image_url, -p_id)

find_pub <- function(pub) {
  games_publisher %>% 
    rename(Complexity = weight, 
           Rating = geek_rating,
           Game = names,
           Time = max_time,
           Players = max_players) %>% 
    filter(publisher == pub, 
           year >= 1990, Time < 400, 
           Players <= 30) %>% 
    ggplot(aes(Complexity, Rating, size = Players, color = Time, group = Game)) +
    geom_point() +
    scale_x_continuous(limits = c(1,5)) +
    labs(title = paste("Board Games", pub, "Helped Publish Since 1990"), color = "Playing Time \n(minutes)", y = "Rating (1-10)", x = "Complexity (1-5)") +
    theme_minimal()
}

Research

As I studied the data I noticed that there were board games recorded 3000 B.C. I also noted that there were board games that required thousands of minutes and about 99 people to play. Because the general population is not interested in those types of board games, I filtered them from the dataset to focus more on the board games we are interested in playing. I took all the board games that were published after 1950 until the beginning of 2018 and created a histogram to show the progression of board games being published over the years. The graph below shows that board games consistantly increased in popularity. There was a huge boom for board games starting in the 90’s and continued to climb all the way until 2015.

I researched what happened in 1990 to have such an affect on board games and I found one of the connections was the internet. The internet brought new ways for publishers to run kickstarters and raise money used to produce board games and it gave them away to easily contact fans to test their board games out before release. I found my research at Financial Review. Also the rise of video games gave board games a boost. It gave an idea that games were not just for children but that adults could also play them.

games_publisher %>% 
  select(names, year) %>% 
  distinct() %>% 
  filter(year > 1950) %>% 
  ggplot(aes(year)) +
  geom_bar(color = "dodgerblue", fill = "dodgerblue4") +
  labs(title = "Quantity of Board Games Published Since 1950", y = "Number of Games Published", caption = "Fig. 1") +
  theme_minimal() +
  theme(axis.title.x = element_blank())


Although there are a lot of really great games, I wanted to focus on the main publishers. Publishers that publish hundreds of games abviously make a lot of good games, otherwise they wouldn’t sell any and they wouldn’t be able to make more. They thrive off the satisfactory of their clients. So I selected the top 10 publishers to see how many board games they publish per year and to see the quality of their games. The two plots below represent this. In Fig. 2 the publishers are in order from most board games published to least board games published.

games_publisher %>% 
  filter(year >= 1990, year != 2018) %>% 
  mutate(year2015 = year == 2015) %>%
  group_by(publisher) %>% 
  mutate(count = n()) %>% 
  ungroup() %>% 
  filter(count >= 167) %>% 
  mutate(publisher = fct_reorder(publisher, count, max, .desc = T)) %>% 
  ggplot(aes(year, fill = year2015)) +
  geom_bar() +
  scale_x_continuous(limits = c(1990, 2020), breaks = seq(1990, 2020, 5)) +
  scale_y_continuous(breaks = seq(0, 30, 15)) +
  scale_fill_manual(values = c("dimgrey", "green")) +
  facet_wrap(~publisher, nrow = 10, strip.position = "right") +
 labs(title = "Top 10 Board Game Publishers Since 1990", subtitle = "2015 had the most board games published in one year", y = "Number of Boardgames Published", x = "Year", caption = "Fig. 2") +
  theme_minimal() +
  theme(strip.text.y = element_text(angle = 0, hjust = 0),
        panel.grid.major = element_blank(),
        panel.grid.minor.x = element_blank(),
        axis.text.y = element_blank(),
        plot.background = element_rect(fill = "beige"),
        panel.grid.minor.y = element_line(color = "grey80"),
        axis.line.x = element_line(),
        axis.ticks.x.bottom = element_line(),
        legend.position = "none",
        axis.title.x = element_blank())

Fig.3 Shows a strong positive correlation between complexity and the rating, but it is interesting to see that the correlation can be interrupted by the length of the game. GMT Games have a lot of board games that take hours to play and the rating decreases the longer it takes to learn to play them.

games_publisher %>% 
  filter(year >= 1990) %>% 
  group_by(publisher) %>% 
  mutate(count = n()) %>% 
  ungroup() %>% 
  filter(count >= 167, max_time < 400, max_players <= 30) %>%
  ggplot(aes(weight, geek_rating, color = max_time)) +
  geom_point() +
  scale_color_gradient(low = "red", high = "yellow", aesthetics = "color") +
  scale_alpha(range = c(.3, .95)) +
  scale_x_continuous(limits = c(0,5)) +
  facet_wrap(~publisher, ncol = 5) +
  labs(title = "Quantity and Rating of Board Games Published By The Top 10 Publishers Since 1990", x = "Learning Complexity (1-5)", y = "Rating (1-10)", color = "Playing time \n(minutes)", alpha = "Maximum \nPlayers", caption = "Fig. 3") +
  theme_minimal() +
  theme(panel.grid.major = element_blank(),
        axis.title.y = element_text(vjust = 3.5),
        axis.title.x = element_text(vjust = -.2),
        panel.spacing = unit(.5, "lines"))


newdf <- games_publisher %>% 
  rename(Publisher = publisher) %>%
  filter(year >= 1990) %>% 
  group_by(Publisher) %>% 
  mutate(Published = n(), `Average Rating` = mean(geek_rating)) %>% 
  ungroup() %>% 
  filter(Published >= 167, max_time < 400, max_players <= 30) 
  
top_10_pub <- newdf %>% 
  select(Publisher, Published, `Average Rating`) %>% 
  distinct() %>% 
  arrange(desc(`Average Rating`))
         
pander(top_10_pub)
Publisher Published Average Rating
Devir 171 6.627
Edge Entertainment 237 6.531
Heidelberger Spieleverlag 250 6.507
Hobby World 167 6.502
Z-Man Games 194 6.446
Asmodee 202 6.445
999 Games 229 6.442
Rio Grande Games 277 6.372
Pegasus Spiele 217 6.339
GMT Games 187 6.07

Top 10 Board Game Publisher’s

Devir

ggplotly(find_pub("Devir"))

Edge Entertainment

ggplotly(find_pub("Edge Entertainment"))

Heidelberger Spieleverlag

ggplotly(find_pub("Heidelberger Spieleverlag"))

Hobby World

ggplotly(find_pub("Hobby World"))

Z-Man Games

ggplotly(find_pub("Z-Man Games"))

Asmodee

ggplotly(find_pub("Asmodee"))

999 Games

ggplotly(find_pub("999 Games"))

Rio Grande Games

ggplotly(find_pub("Rio Grande Games"))

Pegasus Spiele

ggplotly(find_pub("Pegasus Spiele"))

GMT Games

ggplotly(find_pub("GMT Games"))

Conclusion

Each of the individual publisher graphs above show all the games that they participated in publishing (with the filters I specified at the beginning). I say ‘participated’ because a lot of the games were not published by those publishers. As I was looking through each individual plot I found multiple of the same games in each plot. I was confused as to why they had the same games and I did some research and found that a lot of the companies started to merge. Asmodee merged with both Edge Entertainment and Heidelberger Spieleverlag in 2016. The table above shows that Devir ranked highest in their board games but I came to find out that most of those games were not even published by Devir. They just helped with the game.

My original question “Which publisher offers the best variety of high rated board games?” cannot clearly be answered with this data. I will need to continue searching and scraping for the actual publishers. My main drive to answering this question was to provide an interactive visualization to help someone decide what board games they would like to collect. There are a lot of board games in the world and you can’t or don’t want to buy all of them so which ones would give you the right variety. The interactive graphs give you easy access to chosing family games, friend group games, and party games based on complexity and the amount of players.